Multi–Domain Learning: Analysis, and Methods for Multi–Attribute Domains

نویسندگان

  • Mahesh Joshi
  • Noah A. Smith
  • Mark Dredze
چکیده

A common assumption in many machine learning techniques is that the data points are independent and identically distributed (i.i.d.). However, often data can be divided into subgroups of data points that are related in some way, and this violates the assumption that they are identically distributed. Such subgroups are commonly referred to as domains or subpopulations. Domain information can be used to learn beer machine learning models, and multi–domain learning techniques provide one way of using domain information in data. An important question when using multi–domain learning techniques is that of defining how a given dataset is divided into domains. Often some metadata aribute associated with the instances is used for defining domains. In this thesis, we consider the impact of the definition of domains on multi–domain learning, and propose approaches that can handle the case where domains can be defined for a given dataset in more than one way. We first present an empirical analysis of existing multi–domain learning methods, with the aim of understanding how the definition and properties of domains influence their performance. We show that the performance of multi–domain learning techniques can be affected by two factors: (i) an ensemble learning effect due to classifier combination; and (ii) the distribution of class labels across the different domains. We then show that it is possible to design a problem–driven approach to multi–domain learning. We propose a feature representation that is motivated by knowledge about the domains available in the data. Our feature representation explicitly accounts for the structural similarity among syntactic features across multiple domains, even when the domains can be defined in more than one way. Finally, we present learning methods that go beyond the current multi– domain learning paradigm, which assumes a single way of dividing the data into domains. For many text classification tasks, multiple metadata aributes associated with the text can influence the behavior of textual features as well as the performance on the task. e different metadata aributes can have varying utility for the purpose of defining domains for multi–domain learning. Choosing a single metadata aribute to define domains in such cases may not be optimal. We propose methods that allow the use of multiple metadata aributes for defining domains, leading to beer models that are still efficient. Anowledgments e pursuit of a Ph.D. is a challenging1 journey. However, it is made enjoyable by the many souls that touch upon the life of an aspiring Ph.D. in numerous ways. is is my heartfelt aempt to express gratitude to all those wonderful folks who have guided and supported me in my journey. I am very grateful to my thesis advisors: Carolyn and William. Carolyn has been my advisor since my Master’s days at Carnegie Mellon, and has patiently guided me throughout the years, despite my at–times–wild research ideas. She gaveme the freedom to pursuemy research interests, while also bringingme back on track when the focus of my work seemed diluted. She also brought to my work the much– needed linguistic perspective that is valuable in thinking about any problem in natural language processing. She has taught me to think deeply about research problems, and to always meaningfully question and challenge one’s own work. William has been the wise sage on my commiee — with his calm demeanor, and very thoughtful advise on all maers, technical or non–technical. I am very thankful to him that he agreed to be my co–advisor starting Fall of 2010, when I needed a core machine learning perspective for my work going forward. He has taught me that seemingly small things can maer in research, and therefore no thing is too small to pay aention to when doing research. My thesis commiee members, Noah A. Smith and, in particular, Mark Dredze, have played a very significant role in shaping my dissertation work. Noah has always amazed me with his quickness in grasping the core of whatever I described to him (and my descriptions got prey verbose at times), despite long periods of time between our meetings. His energy is contagious, and I have always come away from his office feeling enthusiastic about my work. I have learned a lot from my interactions with him, including my collaboration with him on work in text–driven forecasting. Mark is without a doubt the best “external” thesis commiee member that one could hope for, and a lot more than that. I doubt if anyone else in that role would even imagine being on a phone call at 1 a.m. before a paper deadline, discussing paper edits, and the best way to present results in order to make a point. He has been almost like a third advisor to me starting from the time I did my thesis proposal. He has always asked me the most penetrating questions, and guided me patiently in finding out the answers. Prior to coming to Carnegie Mellon, my thesis advisors at the University of Minnesota Duluth, Richard Maclin and Ted Pedersen were instrumental in igniting my interest in machine learning and natural language processing. I am fortunate that I got to work with them at the time, and it paved my way to Carnegie Mellon. Many amazing teachers have influenced my academic pursuit starting frommy school years, and I am thankful to all of them. I would like to particularly mention Mrs. Mitali Chaudhury, Mrs. Mokashi, Mrs. Nazare, Mr. Rajput, and Ms. Rose from my school years; Mr. Gadgil, and Mr. Jadhav from my high school years; and Mr. Kajave, and Mr. 1Some have said it is harder than having a baby. However, since I cannot ever experience (thankfully!!) the physical ordeal involved in having a baby, I will skip that comparison.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What's in a Domain? Multi-Domain Learning for Multi-Attribute Data

Multi-Domain learning assumes that a single metadata attribute is used in order to divide the data into so-called domains. However, real-world datasets often have multiple metadata attributes that can divide the data into domains. It is not always apparent which single attribute will lead to the best domains, and more than one attribute might impact classification. We propose extensions to two ...

متن کامل

A Comparative Study of Multi-Attribute Continuous Double Auction Mechanisms

Auctions have been as a competitive method of buying and selling valuable or rare items for a long time. Single-sided auctions in which participants negotiate on a single attribute (e.g. price) are very popular. Double auctions and negotiation on multiple attributes create more advantages compared to single-sided and single-attribute auctions. Nonetheless, this adds the complexity of the auctio...

متن کامل

Sensitivity Analysis of Simple Additive Weighting Method (SAW): The Results of Change in the Weight of One Attribute on the Final Ranking of Alternatives

Most of data in a multi-attribute decision making (MADM) problem are unstable and changeable, then sensitivity analysis after problem solving can effectively contribute to making accurate decisions. This paper provides a new method for sensitivity analysis of MADM problems so that by using it and changing the weights of attributes, one can determine changes in the final results of a decision ma...

متن کامل

Automatic Domain Partitioning for Multi-Domain Learning

Multi-Domain learning (MDL) assumes that the domain labels in the dataset are known. However, when there are multiple metadata attributes available, it is not always straightforward to select a single best attribute for domain partition, and it is possible that combining more than one metadata attributes (including continuous attributes) can lead to better MDL performance. In this work, we prop...

متن کامل

Sensitivity Analysis in the QUALIFLEX and VIKOR Methods

The sensitivity analysis for multi-attribute decision making (MADM) problems is important for two reasons: First, the decision matrix as the source of the results of a decision problem is inaccurate because it sorts the alternatives in each criterion inaccurately. Second, the decision maker may change his opinions in a time period because of changes in the importance of the criteria and in the ...

متن کامل

Sensitivity Analysis of TOPSIS Technique: The Results of Change in the Weight of One Attribute on the Final Ranking of Alternatives

Most of data in Multi-attribute decision making (MADM) problems are changeable rather than constant and stable. Therefore, sensitivity analysis after problem solving can effectively contribute to making accurate decisions. In this paper, we offer a new method for sensitivity analysis in multi-attribute decision making problems in which if the weights of one attribute changes, then we can dete...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013